Method and system for generating synchronous multidimensional data streams from a one -dimensional data stream

ABSTRACT

A hardware approach and methodology for receiving one dimensional pixel data stream of scanned lines of a video frame and simultaneously generating therefrom two dimensional parallel data used for real-time video processing in video systems. The parallel data comprise vertical, horizontal and diagonal pixel data centered on a current pixel and included in a window centered on the said pixel.

The present invention relates to video processing systems for displaydevices, preferably, and particularly a hardware approach andmethodology for receiving one dimensional pixel data stream of scannedlines of a video frame and simultaneously generating therefrom multidimensional data used for real-time video signal processing (e.g. edgedetection calculations) in video systems.

Many video processing algorithms require calculations performed within arectangular block of pixels, moving in the direction of the scan, arounda ‘base’ pixel on a pixel by pixel basis, meaning that the results ofthose calculations each have the rate equal to the incoming pixel streamrate. Most often the calculations are done in two directions: horizontaland vertical (so called, two 1D), but the newest algorithms needcalculations performed in diagonal directions +45 and −45 degrees. Thesealgorithms are called full 2D and are utilized, for example, for edgedetection and sharpness enhancement functionality.

When the calculations are done in software (during simulation, forexample, when the performance speed is not a main consideration), avideo frame including a ‘base’ pixel is stored in memory and thecalculations most often are done using single or nested ‘FOR’ loops. Anindex or expression, controlling the performance of the loop, changestypically from 0 to the number, equal to the ‘size of the block −1’ inany particular direction of interest. Software calculations however, donot allow several processes to run in parallel on one processor.Consequently, the calculations are done sequentially and not in realtime.

Hardware approaches that include a system for edge detection exist,however they operate in 1 Dimension, (1D), and process data serially.

It would be highly desirable to implement a purely hardware approachthat allows several processes to run in parallel, preferably, in twodimensions. A hardware implementation of video algorithms enables realtime performance of many processes, thus enabling real-time sharpnessenhancement with edge detection, for example, in two (2) dimensions.

It is thus an object of the implement a purely hardware approach thatallows several processes to run in parallel, preferably, in twodimensions, from a one-dimensional data stream. A hardwareimplementation of video algorithms enables real time performance of manyprocesses, thus enabling real-time sharpness enhancement with edgedetection, for example, in two (2) dimensions, at increased processingspeed. The hardware approach enables real-time block-based 2D videoprocessing performed by parallel operating hardware blocks eachcalculating on one direction of pixels.

According to the principles of the invention, there is provided ahardware apparatus for real time processing of video images comprising:

means for receiving successive scanned lines of video data of a videoframe to be displayed, each received line of video data comprising aone-dimensional stream of pixel data, and a predetermined number M ofpixels from each of N successive lines forming a two-dimensional kernelthat includes a horizontal base line including a base pixel;

vertical data processing means for successively storing pixel data fromsaid successively received lines of a kernel and generating forsuccessive output N pixel data in parallel form, said N parallel pixeldata generated comprising vertically aligned pixel data from each said Nlines including a vertical line of pixel data from said kernel includingsaid base pixel;

horizontal data processing means for successively receiving pixel datafrom a single line of each successive vertically aligned parallel pixeldata output from said vertical data processing means, said receivedpixel data corresponding to said base line including said base pixel,said horizontal data processing means generating for successive output Mpixel data in parallel form comprising pixel data belonging to ahorizontal base line of said kernel;

diagonal data processing means for successively receiving pixel datafrom each successive vertically aligned parallel pixel data output fromsaid vertical data processing means and generating for successive output(in general the number of pixels in the diagonal will be the smallest ofM and N) pixel data in parallel form comprising pixel data belonging tofirst and second diagonals of said kernel, said first and seconddiagonal including said base pixel; and,

timing means for enabling synchronized output of a vertical lineparallel data, horizontal base line parallel data and first and seconddiagonal parallel data each comprising said base pixel of said kernel,to enable subsequent real-time edge detection of a video image at saidbase pixel.

The objects, features and advantages of the present invention willbecome apparent to one skilled in the art, in view of the followingdetailed description taken in combination with the attached drawings, inwhich:

FIG. 1 depicts a generic block diagram 10 of the hardware approach forreal time 2D video processing 10 according to the invention;

FIG. 2 is a circuit diagram depicting components of the vertical sourceblock ‘11’ depicted in FIG. 1;

FIG. 3 is a circuit diagram depicting components of the vertical delayblock ‘301’ depicted in FIG. 2;

FIG. 4 is a circuit diagram depicting the line memory componentscomprising the vertical delay block memory module 101 depicted in FIG.3;

FIG. 5 illustrates the timing of the line memory read and write pulsesoperating to control acquisition of data for the kernel;

FIG. 6 illustrates a detailed diagram of the horizontal delay circuit 22depicted in FIG. 1;

FIG. 7 depicts the organization of the diagonal delay circuit 33 of FIG.1 that may be used to generate the diagonal data for the kernel;

FIG. 8 illustrates an exemplary circuit for ensuring the vertical dataof the kernel is output at the multiplexer at the correct sequence (thisis the ‘inside’ of block 302, 2 FIG. 2); and,

FIG. 9 illustrates an example display 98 comprising pixels of a videoframe at a predetermined resolution, and depicting a kernel 100 about abase pixel 99 therein.

FIG. 1 depicts a generic block diagram 10 of the hardware approach forreal time 2D video processing 10 according to the invention. Forpurposes of description, the present invention is implemented in a highdefinition television system, implementing, for example, the 720P(Progressive) broadcasting video standard. In the 720P standard thereare 720 vertical lines, with each line having 1280 active pixels,however, it is understood that additional information, includinghorizontal and vertical blanking intervals increase the total number ofpixels (e.g., 1650×750). According to the typical television videobroadcasting standard, the video image data enters the system line byline in the vertical direction from top to bottom of the video framewith line scanning performed left to right in the horizontal direction.FIG. 1 depicts video image data entering the system 10 as aone-dimensional data stream 12.

In the video processing algorithm according to the invention,calculations are required to be performed in four directions(horizontal, vertical and two diagonal (e.g., +/45°)) within a block.This block of pixels, alternately referred to herein as a kernel, is ofa size M×N, for example, where M is the kernel's horizontal and N is thekernel's vertical size. Note, for purposes of description M=N and, asshown in FIG. 1, an example 13×13 video image block is depicted. It isunderstood however, that the invention is applicable to other M×N 2Dkernel sizes, and preferably, a size where M and N are odd values, sincethe kernel is symmetrical around a base pixel where the edgedetermination is performed.

In the exemplary system 10 depicted in FIG. 1, there is provided four‘arithmetic’ blocks labeled ‘A’, ‘B’, ‘C’, ‘D’, that perform theprocessing calculations in parallel. Preferably, each of these blocks‘A’, ‘B’, ‘C’, ‘D’ perform the calculations in a single direction ofpixels, e.g., vertical (block A), horizontal (block B), +/−45° (blocksC, D), respectively, and determine the existence of an edge at the basepixel. Preferably, if an edge is found, each of these blocksadditionally determines edge parameters such as width, dynamic range,transition direction, etc. Thus, in FIG. 1, in order for these‘arithmetic’ calculator blocks to be identical and work in parallel(also synchronously), the data streams entering these blocks must havethe same format and be synchronized according to a common time clock 15.

To achieve the above similarity of the data streams for parallelprocessing according to the hardware realization of the invention, apixel rearrangement structure is provided. Such a structure comprises avertical source block ‘11’ (FIG. 1) for receiving successive scannedvideo data lines according to the typical broadcasting standard, eachreceived line comprising a 1 dimensional data stream 12 of the videoframe. After receiving an amount of data from the video lines, thevertical source block ‘11’ builds a M×N (e.g., 13×13) pixel block orkernel which is processed for generating the parallel streams used bythe calculator blocks ‘A’, ‘B’, ‘C’, ‘D’. As will be explained infurther detail, the vertical source block ‘11’ includes a vertical delayblock ‘301’ and a line multiplexer ‘302’, configured in the manner asdepicted in FIG. 2. The vertical delay block ‘301’ comprises a memorymodule ‘101’ and a memory controller ‘102’ configured in the manner asdepicted in FIG. 3. The memory module ‘101’ includes N line memories‘201’ configured in the manner as depicted in FIG. 4. As will bedescribed, the vertical source block ‘11’ including line memories arenecessary because information to calculate the edge at a base pixelwithin the kernel requires information for lines that have already beenreceived and lines not yet received. Particularly, the memory invertical source block ‘11’ is necessary to store the video pixelinformation for lines in the kernel which have already been received, inthe exemplary case of a 13×13 kernel, pixel data from each of six (6)lines 20 up (before) the video data line 30 including the base pixel,and video pixel data for six (6) successive lines 40 down (after) theline 30 including the base pixel in the kernel which will subsequentlybe received. Thus, in the example embodiment, the 13 lines of pixelinformation are stored in the line memories residing in the verticaldelay block 301 of FIG. 2 in order to build the kernel.

As now described with reference to FIGS. 2 and 3, the vertical delayblock 301 includes memory controller 102 and memory module 101. The linememories' performance is controlled by the line memory controller 102which receives control signals including the vertical blank (V_blank)signal 18 and horizontal blank (H_blank) signal 17 and the clock 15. Thevertical delay block memory module 101 includes the line memories suchas shown in FIG. 4. The line memories' performance is controlled by theline memory controller 102 in the following manner: after the verticalblanking interval, i.e., receipt of the V_blank reset pulse 18, thereceived H-blank pulses 17 are counted so that it is known exactly wherein the vertical direction of a frame the current active video lineinformation is being received. Thus, after the vertical blankinginterval, and following receipt of the H_blank pulse corresponding tothe vertical location in the video frame having the 1^(st) active lineof a kernel for a desired base pixel, all the 1^(st) active video linedata of that kernel is written in line memory_1 201, labeled U1 in FIG.4. In the example embodiment of a 13×13 kernel described herein, thislocation is six (6) lines up from the line 30 that includes the basepixel as shown in FIG. 1. Immediately following receipt of the nextH_blank pulse 17, the 2^(nd) line of the kernel (e.g., five lines (5) upfrom the base line 30 in the example embodiment) is written into linememory_2 201, labeled U2 in FIG. 4, and this process continues, etc.,until the Nth line is written into memory N, labeled U13 in FIG. 4(e.g., six lines (6) below the base line 30 in the example embodiment).It is understood that the N+1th line is written into memory 1, N+2thline into memory 2, etc. as the video scanning progresses. That is, inthe preferred embodiment, the reading operation starts with the start ofthe Nth line as all data from lines 1 through N−1 of the kernel isstored and available for processing. Thus, the data from memories 1 toN−1 are read in parallel during the writing of the data at the Nthactive video line. Then, during writing of the N+1th active video linethe line memories 2 to N are read, during N+2th line the line memory 1and line memory 3 to line memory N are read, etc. Note, that the linememory, which is in active ‘write’ state during a particular line timeis not read out during that line time as illustrated in FIG. 5.

Particularly, as shown in FIG. 3, memory control block 102 generatesrespective read pulses 48 and write pulses 49 for controlling read andwrite operations of the line memories 201 (e.g., U1-U13 of FIG. 4) ofthe memory module 101. The timing of these line memory write pulseslabeled WR1-WR13 are depicted in the exemplary embodiment of FIG. 5,with the first active line write pulse WR1 (for writing data of activevideo line 1 of the kernel) shown immediately following receipt of aV_blank pulse, and the next successive active line write pulse WR2triggered at the falling edge of the prior (WR1) pulse. As may be knownto skilled artisans, this process may be controlled by an H_blank pulsescounter. This process is repeated for each subsequent write pulse untilWR13 is generated, as shown in FIG. 5. It is understood that in FIG. 5,the duration of the pulse corresponds to one line time. As depicted inFIG. 5, once active line N (e.g., N=13) is being read as depicted bypulse 59, the data at line memories 1 through N−1 are beingsimultaneously read (in parallel) as indicated by the triggering ofrespective read pulses RD1-RD12 depicted as lines 48. In the next kernelshift, as new line N+1 is being written to the line memory 1 as depictedby WR1 pulse 69, the data at line memories 2 through N are beingsimultaneously read (in parallel) as indicated by the active high stateof respective read lines 48 (RD2-RD12) and the triggering of read pulseRD13 depicted as pulse 58. It is understood that, for the write durationto line memory 1 for the new line N+1, the reading of line memory 1 isprevented by the state change depicted as the active low state 70. Theprocess continues as each subsequent line is being written into linememories and the data lines 48 are being read in parallel. Thus, for thenext kernel shift, line N+2 is read into the line memory 2 as controlledby pulse 79, and the read pulses for line memory modules 1 and 3 throughN are active and the corresponding data stored therein read out inparallel. It is understood that reading of line memory 2 is nowprevented by the state change depicted as the active low state 71, etc.It should be understood that the duration of the ‘read’ and ‘write’pulses may also be equal to the active part of the video line, thuspreserving the memory length, i.e., the blanking part is not stored.This will require a more sophisticated ‘Memory control’ block. However,if this approach is taken, the ‘border’ pixels from the 1^(st) to the5^(th) on all sides of the video frame will have a non-symmetricalkernel. Ideally, for these pixels the data is ‘mirrored’, i.e.,available data is symmetrically copied to the missing locations, whichwill require even more sophisticated controls. In the present exampledescribed, the data from the blanking part may be used in those ‘border’kernels, which, is acceptable for most of the consumer systems becauseof the ‘overscan’, i.e., when the visible part of the image is slightlyless by a couple of pixels, than the total picture resolution.

Referring back to FIG. 2, the line multiplexer block 302 receives thestored vertical data 50 which is output from the line memories 201 ofthe vertical delay block memory module 101 (FIG. 3) in parallel.Preferably, the line multiplexer 302 re-arranges the line sequence sothat the data input to the ‘arithmetic’ block always receives thecurrent incoming line as the bottom line (e.g., N=13 or base +6); theline stored in the previous line period as the one line above it (e.g.,N=12 or base +5), and so on, such that the line stored N−1 line timesbefore (e.g., line N=1 or base line −6) appears as the most top lineregardless from which particular line memory the data is read out. Thus,due to shifting of the write and read points under memory controldescribed with respect to FIG. 5, the line multiplexer 302 ensures thatthe data is output always at the correct sequence and that the block(kernel) smoothly moves in the vertical direction. For an exampleembodiment, as shown in FIG. 8, this operation (actually, as well asothers) may be coded in HDL and may include a simple counter device 77receiving H_blank 17, V_blank 18 and clock 15 signals to generate anoutput 78 that control multiplexer operations necessary to achieve this.

It should be understood that the vertical source block ‘11’ processingis a real-time, continuous process such that the base pixel, andconsequently the kernel, and the availability of 2D pixel informationtherein for determining edges at base pixels, constantly changes witheach successive scan in the vertical direction as performed by the videoprocessing system of a particular display device.

Having performed the real-time process described herein with respect toFIGS. 2-5, a vertical line of pixels is now available with the top linecorresponding to the base line −6 lines of the block (kernel) and thebottom line corresponding to the base line +6 lines, for the exampleembodiment described. From this vertical line of pixels, the generationof the horizontal and diagonal lines is performed in real time asfollows:

Particularly, as depicted in FIG. 1, a base pixel (at location N=7 ofthe pixel kernel) that is received from each vertical line of the kernelform a horizontal line. Thus, a horizontal line may be formed, which isthe center line of the kernel in vertical direction is called the baseline and it contains the all ‘base’ pixels. To create the data sequencearound the ‘base’ pixel in the horizontal direction, the data of thisbase line is input from bus 16 to horizontal delay circuit 22 where thepixels are delayed, so that base pixel of interest corresponds to themiddle of the horizontal line. FIG. 6 illustrates a detailed diagram ofthe horizontal delay circuit 22, which comprises a shift register withserial load and parallel unload including M (e.g., M=13) delay circuitsconnected serially, with each delay comprising one D flip-flop 401. Eachof the registers has an output 402 to the corresponding ‘arithmetic’block B as shown in FIG. 1.

To create the two diagonal (e.g., +/−45°)) sequences each output of thevertical source block 11 is fed as signals 19 into a diagonal sourceblock 33 in FIG. 1. As depicted in FIG. 7, diagonal source block 33comprises a M×N configuration of shift registers, each includingone-clock delay ‘501’. It is understood that, in a generic case, whenM≠N (not a square kernel), the length of the diagonal will be thesmallest of M and N. Consequently, all the following formulas would bechanged accordingly as would be within the purview of skilled artisans.The shift registers 501 are connected serially for delay every clockcycle, with the amount of registers in the first row from the 1^(st)register 505 to the Nth register 510 is M, the amount of registers inthe second row from register 515 to the N−1th register 520 is M−1, etc.The length of the center row comprises a serial connection of (M+1)/2 inthe example embodiment of M=N=13, i.e., a serial connection fromregister 525 to the (N+1)/2th register 530. To create the diagonalsequence in the +45 degrees direction the outputs 550 a through 550 g ofthe last one-clock delay of shift registers from 1 to (M+1)/2 are takentogether with the output 560 a of the first delay of the Nth shiftregister, the output 560 b of the second delay of the N-1th shiftregister, the output 560 c of the third delay of the N−2th shiftregister, and so on, until the output 560 f of register (M+3)/2 isobtained. Likewise, for the −45 degrees diagonal, direction the outputs570 a-570 g of the last delays of shift registers N to (M+1)/2 (register530) are taken together with the output 580 a of the first delay of the1^(st) shift register 505, the output 580 b of the second delay of the2^(nd) shift register, the output 580 c of the third delay of the 3rdshift register, and so on, including register 580 f. As describedherein, the outputs 550 a-550 g, 560 a-560 f and 570 a-570 g, 580 a-580f of the respective two diagonal (i.e., +/−45°)) sequences generated bythe diagonal source block 33 are available as 2D informationsynchronized for simultaneous parallel output for edge detectorcalculator block ‘D’ as depicted in FIG. 1.

Further in FIG. 1, it should be understood that a vertical data delayblock ‘44’ is provided in order to delay the output of the verticalsource block ‘1’ by (M+1)/2 clock cycles to align the 2D vertical sourceparallel data with the 2D horizontal parallel data and the 2D diagonalparallel data outputs for simultaneous input to the arithmetic blocks‘A’ to ‘D’.

While there has been shown and described what is considered to bepreferred embodiments of the invention, it will, of course, beunderstood that various modifications and changes in form or detailcould readily be made without departing from the spirit of theinvention. It is therefore intended that the invention be not limited tothe exact forms described and illustrated, but should be constructed tocover all modifications that may fall within the scope of the appendedclaims.

1. A hardware apparatus for generating synchronous multidimensional datastreams from a one-dimensional data stream comprising: means forreceiving successive scanned lines of video data of a video frame to bedisplayed, each received line of video data comprising a one-dimensionalstream of pixel data, and a predetermined number M of pixels from eachof N successive lines forming a two-dimensional kernel that includes ahorizontal base line including a base pixel; vertical data processingmeans for successively storing pixel data from said successivelyreceived lines of a kernel and generating for successive output N pixeldata in parallel form, said N parallel pixel data generated comprisingvertically aligned pixel data from each said N lines including avertical line of pixel data from said kernel including said base pixel;horizontal data processing means for successively receiving pixel datafrom a single line of each successive vertically aligned parallel pixeldata output from said vertical data processing means, said receivedpixel data corresponding to said base line including said base pixel,said horizontal data processing means generating for successive output Mpixel data in parallel form comprising pixel data belonging to ahorizontal base line of said kernel; diagonal data processing means forsuccessively receiving pixel data from each successive verticallyaligned parallel pixel data output from said vertical data processingmeans and generating for successive output pixel data in parallel formcomprising pixel data belonging to first and second diagonals of saidkernel, said first and second diagonal including said base pixel; and,timing means for enabling synchronized output of a vertical lineparallel data, horizontal base line parallel data and first and seconddiagonal parallel data each comprising said base pixel of said kernel,to enable subsequent real-time processing of a video image at said basepixel.
 2. The hardware apparatus according to claim 1, wherein thekernel comprises an M×N matrix of pixels symmetrical about said basepixel.
 3. The hardware apparatus according to claim 2, wherein M=N. 4.The hardware apparatus according to claim 2, wherein said timing meansincludes means for delaying said output of said vertical data processingmeans by (M+1)/2 clock cycles to align the vertical line parallel dataincluding said base pixel with said the horizontal base line paralleldata and diagonal parallel data outputs.
 5. The hardware apparatusaccording to claim 2, wherein said vertical data processing meanscomprises: N memory storage devices for successively storing pixel datafrom a corresponding line of said N successively received scanned videolines; and, memory controller for controlling writing of receivedone-dimensional scanned pixel data line to a respective said memorystorage device and, reading of data from each of said N memory storagedevices to form said N pixel data parallel outputs, each N pixel dataparallel output generated in a successive clock cycle.
 6. The hardwareapparatus according to claim 5, wherein said memory controller includesmeans for enabling simultaneous reading of data from each of a 1^(st)memory storage device through said N-1th memory storage device as pixeldata of said Nth scanned video line is written to said Nth memorystorage device.
 7. The hardware apparatus according to claim 6, whereinsaid kernel is successively shifted for processing at a new base pixelat receipt of each successive scanned line after said Nth video line,said memory controller enabling writing of pixel data of a receivedN+1th scanned video line in said 1^(st) memory storage device whileenabling simultaneous reading of data from each of a 2^(nd) memorystorage device through said Nth memory storage device.
 8. The hardwareapparatus according to claim 6, wherein at each kernel shift, eachsuccessive input line N+X line is read into a corresponding numberedline memory X of said N memory storage device, where 1≦X<N, whilecorresponding data stored in remaining memory storage devices exclusiveof said line memory X is read out in parallel.
 9. The hardware apparatusaccording to claim 8, wherein said vertical data processing meansfurther comprises: means for receiving the data read from each of said Nmemory storage devices; and, means for re-arranging the line sequence sothat the vertical line parallel data output from said vertical dataprocessing means is arranged such that the received incoming line Xreceived in sequence (where 1≦X<N) is output as a corresponding line Xof said N parallel output lines regardless from which particular linememory storage device the corresponding pixel data is read out.
 10. Thehardware apparatus according to claim 9, wherein said means forre-arranging the line sequence includes a multiplexer device forensuring that the data is output always at the correct sequence and thata kernel shifts in the vertical direction.
 11. The hardware apparatusaccording to claim 10, wherein said means for re-arranging the linesequence further comprises a counter device for receiving H_blank pulsesat its clock input to ensure that the N parallel output line data isoutput at the correct sequence.
 12. The hardware apparatus according toclaim 1, wherein the number of pixel data output in parallel form fromsaid diagonal data processing means is the smallest of M and N.
 13. Amethod for making video data available for real time processingcomprising the steps of: a) receiving successive scanned lines of videodata of a video frame to be displayed, each received line of video datacomprising a one-dimensional stream of pixel data, and a predeterminednumber M of pixels from each of N successive lines forming atwo-dimensional kernel that includes a horizontal base line including abase pixel; b) successively storing pixel data from said successivelyreceived lines of a kernel and generating for successive output N pixeldata in parallel form, said N parallel pixel data generated comprisingvertically aligned pixel data from each said N lines including avertical line of pixel data from said kernel including said base pixel;c) successively receiving pixel data from a single line of eachsuccessive vertically aligned parallel pixel data output, said receivedpixel data corresponding to said base line including said base pixel, d)generating for successive output M pixel data in parallel formcomprising pixel data belonging to a horizontal base line of saidkernel; d) successively receiving pixel data from each successivevertically aligned parallel pixel data output from said vertical dataprocessing means; e) generating for successive output pixel data inparallel form comprising pixel data belonging to first and seconddiagonals of said kernel, said first and second diagonal including saidbase pixel; and, f) synchronizing output of a vertical line paralleldata, horizontal base line parallel data and first and second diagonalparallel data each comprising said base pixel of said kernel, to enablesubsequent real-time processing of a video image at said base pixel. 14.The method according to claim 13, wherein said step b) of successivelystoring pixel data from said successively received lines of a kernelincludes the step of: successively storing pixel data from a line ofsaid N successively received scanned video lines in a correspondingdevice of N memory storage devices; and, writing a receivedone-dimensional scanned pixel data line to a respective said memorystorage device; and, reading data from each of said N memory storagedevices to form said N pixel data parallel outputs, each N pixel dataparallel output generated in a successive clock cycle.
 15. The methodaccording to claim 13, including the steps of enabling simultaneousreading of data from each of a 1^(st) memory storage device through saidN-1th memory storage device while writing of pixel data of said Nthscanned video line into said Nth memory storage device.
 16. The methodaccording to claim 15, wherein said kernel is successively shifted forvideo processing at a new base pixel at receipt of each successivescanned line after said Nth video line, said method including the stepsof: writing pixel data of a received N+1th scanned video line into said1^(st) memory storage device; and simultaneously reading of data fromeach of a 2^(nd) memory storage device through said Nth memory storagedevice.
 17. The method according to claim 15, wherein at each kernelshift, the steps of: reading each successive input line N+X into acorresponding numbered line memory X of said N memory storage devices,where 1≦X<N, and, simultaneously reading out in parallel thecorresponding data stored in remaining memory storage devices exclusiveof said line memory X.
 18. The method according to claim 17, furthercomprising the steps of: receiving the data read from each of said Nmemory storage devices prior to parallel output; and, re-arranging theline sequence so that the vertical line parallel data output is arrangedsuch that the received incoming line X received in sequence (where1≦X<N) is output as a corresponding line X of said N parallel outputlines regardless from which particular line memory storage device thecorresponding pixel data is read out.
 19. The method according to claim13, wherein the number of pixel data output in parallel form comprisingpixel data belonging to first and second diagonals of said kernel is thesmallest of M and N.
 20. A video display device including hardwareapparatus for making video data available for real time processing, saidapparatus comprising: means for receiving successive scanned lines ofvideo data of a video frame to be displayed, each received line of videodata comprising a one-dimensional stream of pixel data, and apredetermined number M of pixels from each of N successive lines forminga two-dimensional kernel that includes a horizontal base line includinga base pixel; vertical data processing means for successively storingpixel data from said successively received lines of a kernel andgenerating for successive output N pixel data in parallel form, said Nparallel pixel data generated comprising vertically aligned pixel datafrom each said N lines including a vertical line of pixel data from saidkernel including said base pixel; horizontal data processing means forsuccessively receiving pixel data from a single line of each successivevertically aligned parallel pixel data output from said vertical dataprocessing means, said received pixel data corresponding to said baseline including said base pixel, said horizontal data processing meansgenerating for successive output M pixel data in parallel formcomprising pixel data belonging to a horizontal base line of saidkernel; diagonal data processing means for receiving pixel data fromeach successive vertically aligned parallel pixel data output from saidvertical data processing means and generating for successive outputpixel data in parallel form comprising pixel data belonging to first andsecond diagonals of said kernel, said first and second diagonalincluding said base pixel; and, timing means for enabling synchronizedoutput of a vertical line parallel data, horizontal base line paralleldata and first and second diagonal parallel data each comprising saidbase pixel of said kernel, to enable subsequent real-time processing ofa video image at said base pixel.